Robust Localization of Multiple Speech Sources Based on Time Difference of Arrival in Real Environments for Binaural Robot Audition
نویسندگان
چکیده
This paper presents a multisource speech localization method based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) for binaural robot audition. The direction-of-arrival (DOA) estimation based on the GCC-PHAT method was extended to enable simultaneous multiple DOA estimations with a signal-to-noise ratio (SNR)-based weighting function. The standard K-means clustering algorithm was improved for the purpose of multisource speech localization by adding two additional steps that increase the number of clusters automatically and eliminate clusters containing incorrect DOA estimations. Experiments conducted on the SIG-2 humanoid robot in real environments show that our method can localize multiple speech sources in real-time with localization error below 5.96°.
منابع مشابه
Improvement of Sound Source Localization for a Binaural Robot of Spherical Head with Pinnae
II diffraction problem was overcome by incorporating a new time delay factor into the GCC-PHAT method under the assumption of a spherical robot head. The ambiguity problem was overcome by utilizing the amplification effect of the pinnae. Finally the difficulties with multisource sound localization in real environments were addressed by extending the proposed ML-based SSL method using the new ti...
متن کاملSpatial Hearing Algorithms Based on Binaural Zero-Crossings: Sound Source Localization, Segregation, and Dereverberation
This thesis concerns a new zero-crossing-based binaural model for spatial hearing. Conventional binaural model computes cross-correlations of binaural signals for the estimation of the interaural time difference which is a primary spatial cue. However, the cross-correlationbased binaural processing model requires high computational complexity and suffers from inaccuracies in localizing sound so...
متن کاملImproved binaural sound localization and tracking for unknown time-varying number of speakers
This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date...
متن کاملApplying scattering theory to robot audition system: robust sound source localization and extraction
Robot audition by its own ears (microphones) is essential for natural human-robot communication and interface. Since a microphone is embedded in the head of a robot, the head-related transfer function (HRTF) plays an important role in sound source localization and extraction. Usually, from binaural input, the interaural phase difference (IPD) and interaural intensity difference (IID) are calcul...
متن کاملEffects of Moving Landmark’s Speed on Multi-Robot Simultaneous Localization and Mapping in Dynamic Environments
Even when simultaneous localization and mapping (SLAM) solutions have been broadly developed, the vast majority of them relate to a single robot performing measurements in static environments. Researches show that the performance of SLAM algorithms deteriorates under dynamic environments. In this paper, a multi-robot simultaneous localization and mapping (MR-SLAM) system is implemented within a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013